perf: defer timezonefinder & pandas imports to cut startup time#482
Open
mrosseel wants to merge 177 commits into
Open
perf: defer timezonefinder & pandas imports to cut startup time#482mrosseel wants to merge 177 commits into
mrosseel wants to merge 177 commits into
Conversation
- build.yml: single build + Cachix push + unstable channel updates - release.yml: manual release workflow for stable/beta channels Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The SD image module provides filesystems, but toplevel builds need a minimal stub to evaluate successfully. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Required for NixOS module system to accept devMode setting. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Required when module has both options and config sections. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Replaces FIXME placeholders with actual SRI hashes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Uses Pi5 runner when RUNNER_LABELS variable is set, falls back to ubuntu with QEMU emulation otherwise. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Filter to only Pi 4B device tree (CM4 incompatible with our overlays) - Use shorthand DTS syntax for PWM overlay Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Untracked file was excluded from Nix flake source tree, causing "No module named 'PiFinder.sys_utils_base'" on SD card boot. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add camera overlay (imx477) to netboot config.txt via flake.nix - Fix sys_utils import in main.py to use utils.get_sys_utils() - Add hip_main.dat fetch to pifinder-src.nix for starfield plotting - Add dma_heap udev rule for libcamera/picamera2 access - Fix shared memory naming in solver.py (remove leading /) - Add DNS nameservers for netboot environment - Document power control scripts in CLAUDE.md Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add runtimeCameraSelection option to hardware.nix (default: true) - SD image includes config.txt with "include camera.txt" directive - Users can edit camera.txt and reboot to switch cameras - Supported cameras: imx296, imx290 (imx462), imx477 - Fix cameraDriver scope in hardware.nix (moved to top-level let) - Add sudoers rules for systemctl stop/start pifinder.service - Add DMA heap udev rule for libcamera video group access - Netboot config sets cameraType = "imx477" for HQ camera dev Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Refactor sys_utils modules to use common base class - Add sys_utils_nixos.py for NixOS-specific implementations - Add get_sys_utils() detection in utils.py for platform selection - Add flake.lock for reproducible builds - Add NetworkManager config to networking.nix - Add deploy-image-to-nfs.sh for netboot development workflow Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update build.yml CI workflow - Fix fonts.py import - Fix marking_menus.py formatting - Add missing import to preview.py - Simplify objects_db.py - Add catalog_imports improvements - Update pifinder_objects.db Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Switch to NFSv4 with caching disabled (noac, actimeo=0) - Disable auto-optimise-store in devMode (hard links fail on NFS) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add ServerAliveInterval/CountMax to prevent timeout during transfers - Use rsync -R (relative) to preserve directory structure correctly Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Comets.txt is downloaded at runtime and must be in a writable location, not the read-only Nix store. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Extend eth0 wait to 30 seconds with debug output - Wait for link carrier before DHCP - Add DHCP retries (3 attempts) - Add LIBCAMERA_IPA_MODULE_PATH to pifinder service environment Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Restore SUBSYSTEM=="pwm" udev rule that was accidentally removed. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Turns on keypad LEDs during sysinit for early visual boot feedback. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- boot-splash.c: displays welcome image with scanning animation - Starts at sysinit, stops when pifinder.service starts - Much faster than Python splash Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove nixos-hardware module (saves 659MB linux-firmware) - Fetch nixos-rebuild at runtime (saves ~500MB llvm/nix deps) - Remove git from systemPackages (nix has built-in git for flakes) Target: ~150MB vs current 1.7GB
- Remove default packages (vim, nano, etc) - Disable polkit, udisks2, speechd - Should reduce closure significantly
NetworkManager-vpnc alone has 1.1GB closure (webkitgtk, llvm, etc). Disable all NM plugins for bootstrap - we just need WiFi.
- nixos/RELEASE.md: document version flow + release/dev pipelines - software.py: MIN_NIXOS_VERSION 2.5.0 → 3.0.0 - python-packages.nix: add pyerfa (used by calc_utils since upstream brickbots#423, silently dropped during upstream merge because requirements.txt is not mirrored into the Nix env) - python-packages.nix: include hardwarePackages in devEnv so nix develop matches the runtime import surface - python-packages.nix: select simplejpeg wheel by host arch (was hard-pinned to aarch64; failed to import on x86_64 dev shells) - flake.nix: apply libcamera -Dpycamera=enabled overlay to the x86_64 devShell and export PYTHONPATH so picamera2 finds the python bindings Verified: nix develop --command python -c 'import …' on x86_64 succeeds for all 34 imports (erfa, picamera2, libcamera, PyHotKey, pynput, hardware packages, etc.). RPi.GPIO still raises its own "only on a Raspberry Pi" RuntimeError at import time — expected, matches upstream pip behavior on non-Pi hardware. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. gpsd-add-uart: rename /dev/ttyAMA3 → /dev/ttyAMA1 (6 sites). The uart3 overlay surfaces as ttyAMA1, matching hardware.nix's udev rule and the Debian image's gpsd.conf. 2. /etc/default/gpsd: drop the custom USBAUTO+GPSD_SOCKET pair, write upstream pi_config_files/gpsd.conf's three lines verbatim. DEVICES now opens the on-board UART at startup. gpsd-add-uart kept as the boot-time socket-activation kick; can retire after on-Pi confirmation. 3. pifinder-upgrade: replace fragile `nix build --dry-run | grep` progress with `nix --log-format internal-json build … --max-jobs 0` parsed by gawk, counting type=100 (actCopyPath) start/stop events. Stable across Nix ≥ 2.4. Validated against a real cache.nixos.org substitute (5/5). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
build.yml was defaulting VERSION=2.5.0 on push triggers (and the workflow_dispatch default also read 2.5.0), so this branch's auto-build was publishing v2.5.0-migration tarballs while the migration branch's downloader (software.py _MIGRATION_VERSION_INFO and the brickbots/PiFinder release branch's migration_gate.json) points at v3.0.0-migration. Bump both the workflow_dispatch default and the push-trigger fallback to 3.0.0 so a normal push to nixos publishes the artifact at the URL the migration branch actually downloads. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Lint & Test workflow was using DeterminateSystems/magic-nix-cache-action,
which is backed by GitHub's Actions Cache and gets HTTP-418 rate-limited
under sustained traffic — exactly the failure mode that just broke
type-check ("--install-types failed: substituter disabled, rate limit
exceeded"). The Nix substituter is then disabled mid-run and dependent
commands like mypy --install-types fall over.
Replace it with cachix/cachix-action@v17 pointed at the pifinder cache
(read-only, no auth token needed). Same backing as build.yml, so dev-shell
substitutes hit the same store paths the system closure was built against.
cache.nixos.org remains the default fallback.
Also bump actions/checkout@v4 → @v6 in this file to align with the Node 24
migration in build.yml/release.yml.
This is a stop-gap. The real fix is standing up Attic with an S3 backend
so both build.yml and lint.yml can retire cachix.org and MNC together —
tracked separately.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rong) Previous commit on this branch swapped DeterminateSystems/magic-nix-cache-action for cachix/cachix-action@v17 thinking the MNC HTTP-418 rate-limit was the root cause of the failed lint/type-check. That swap made things worse: the pifinder Cachix only contains the NixOS *system closure*, not the *dev shell* (cedar-detect-server's Rust crate builds). With MNC removed, the dev shell had to rebuild from source, which fetched crate tarballs from a crates.io mirror and hit 403s. MNC was carrying real weight by caching locally-built derivations between runs. Restoring it. The original MNC rate-limit was a transient flake — re-runs work around it. Real fix is standing up Attic with S3-backed storage so both build.yml and lint.yml can retire MNC and cachix.org together. The checkout@v4 → @v6 bump from the swap commit is preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Nix derivation was overwriting pifinder-build.json with
"nix-${gitRev}" at build time, so even released devices reported a
random short-sha instead of the release version. Three writers became
two, with consistent semantics everywhere:
- pifinder-src.nix: drop the cat > pifinder-build.json block and the
gitRev arg — the derivation now copies the source file through
verbatim, no version invention.
- flake.nix: drop the pifinderGitRev _module.args plumbing.
- services.nix: drop pifinderGitRev / gitRev from the pifinder-src
import.
- release.yml: reorder so the version stamp is written into the
working tree BEFORE the nix build (so the store path bakes in the
release version, not the previous stamp), then re-stamp with the
resulting store_path after the build, commit, push, tag.
Result: SD image, cachix closure, and committed JSON all agree on
the released version. Matches the flow already documented in
nixos/RELEASE.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captures the decision to self-host Attic at cache.pifinder.eu, backed by SQLite + local disk initially with Cloudflare R2 as the eventual chunk store. Covers considered alternatives (cachix.org, Magic Nix Cache, nix-casync, harmonia) and the operational consequences for CI publishing, on-device updates, and failure fall-through to cache.nixos.org. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces pifinder.cachix.org with the self-hosted Attic instance at https://cache.pifinder.eu/pifinder (ADR 0004) on both the device-side substituter list and the CI publish path. cachix.org is removed entirely with no fallback, so this is the first build that proves Attic stands on its own. services.nix: substituters = ["https://cache.pifinder.eu/pifinder" "https://cache.nixos.org"] trusted-public-keys = ["pifinder:8UU/O3oLkaJHHUyqEcPGl+9F1m4MqDca39Ewl49jBmE=" "cache.nixos.org-1:..."] (pifinder.cachix.org and its key removed.) build.yml — build-native, build-emulated, build-migration-tarball: - remove cachix/cachix-action steps - remove `cachix push` (replaced by `attic push pifinder:pifinder`) - add a "Setup Attic substituter" step that runs nix profile install nixpkgs#attic-client attic login pifinder https://cache.pifinder.eu \"\$ATTIC_TOKEN\" attic use pifinder:pifinder before the build, so the build itself substitutes from cache.pifinder.eu. build-emulated swaps cachix/install-nix-action for DeterminateSystems/nix-installer-action — no cachix dependencies left. First post-cachix build is expected to be slow: pifinder.cachix.org's warmed-up paths are gone. Once it completes and `attic push` lands, subsequent builds substitute from cache.pifinder.eu. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cceed) Brickbots/PiFinder runs the workflow on pull_request:synchronize from PR brickbots#379 (mrosseel:nixos -> brickbots:main). With ATTIC_TOKEN now also set on brickbots, build-emulated's 'Push to Attic' step succeeds — the last failing step. But stamp-build then tries to checkout the PR head ref (mrosseel:nixos) and 'git push' a pifinder-build.json commit there, which can't work from brickbots' Actions runner (no write access to the fork). The PR run therefore failed at the stamp step even after attic was wired correctly. Gating stamp-build on github.event_name == 'push' keeps the canonical stamp on the mrosseel:nixos push run (where it works) and skips it on brickbots PR runs (which only need to verify the build). Net effect: both repos' CI runs in parallel without stomping — - Both build and push the same closure to cache.pifinder.eu (attic FastCDC-dedups, so the second push is a no-op), - Only mrosseel stamps pifinder-build.json, - build-migration-tarball already gates on github.ref == refs/heads/nixos so it only runs on mrosseel push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sync the fork with upstream up to 0e0ec03 (39 commits since the last sync at e4b623a): typed positioning model (PointingEstimate / ImuSample / AlignedResult), polar alignment, comet vectorization, single-instance lock, resolution-flexible UI, i18n fonts. Conflict resolution (18 files): - Adopt upstream's typed positioning model across solver, integrator, state, imu_pi, status, main, base, console, object_details, fonts, menu_structure, auto_exposure, camera_interface. - Keep NixOS layers: software.py (store-path upgrade UI), utils.py (build_json, writable comet_file, robust pifinder_dir), sys_utils.py. - plot.py: restore top-level `import pandas` (upstream added module-level uses; the fork had made it lazy) and drop the redundant lazy imports. - solver.py taken verbatim from upstream; the cedar-detect dev-spawn is proposed upstream separately (brickbots#478). - Drop fork-deleted nox/pip tooling (requirements.txt, version.txt). deps: add xlrd to python-packages.nix (pyerfa already present). Verification: ruff clean; 479 unit tests pass. Remaining failures are not from this merge -- 6 test_software + 4 test_t9_search pre-exist on origin/nixos; 5 test_comets fail on skyfield 1.53 (upstream vectorization, has a runtime fallback). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Skyfield's propagate() lays a batched Kepler orbit out as (3, #orbits, #times) but sets output_shape = (3,) + t1.shape, so a batched orbit propagated to a single scalar time raises "cannot reshape array of size 3N into shape (3,)" on skyfield >= 1.46. (The fork uses nixpkgs' skyfield 1.53; upstream pins 1.45, which tolerated it.) Give every comet the same target time as an (N, 1) column so output_shape matches the (3, N, 1) result, then squeeze the time axis. Verified against the per-comet path (0 AU difference) and the existing tests/test_comets.py oracle (7/7 pass). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…che) The upstream merge auto-merged catalogs.py entirely to the fork's side, silently dropping upstream's T9 search (brickbots#464: KEYPAD_DIGIT_TO_CHARS, search_by_t9) and the catalog disk cache; test_t9_search (an upstream test) failed as a result. Take upstream's catalogs.py wholesale, matching the upstream main.py and menu_structure the merge already adopted. Trade- off: drops the fork's priority-fast-path / background-loader in favour of upstream's cache + loader (the agreed "take upstream catalogs" choice). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MIN_NIXOS_VERSION was intentionally bumped 2.5.0 -> 3.0.0 (d705057, prep for 3.0), but test_software still asserted 2.5.x/2.6.x as qualifying. Shift the qualifying mock releases and _meets_min_version cases into the 3.x line so the suite matches the current minimum; below-min (2.4.0) and draft cases are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
main.py created (and server.py switched) the pifinder_logconf.json symlink relative to the cwd, which on NixOS is the read-only /nix/store -> the service crash-looped at startup (OSError: Read-only file system). Follow the same split config.json already uses: the logconf presets stay read-only in the source tree (utils.pifinder_dir/python/logconf_*.json) and the active selection is persisted as a bare filename in the writable data dir (PiFinder_data/log_config), resolved via utils.active_logconf_path(). Storing the name (not a store-path symlink) keeps the choice valid across upgrades (which GC old store paths) and reboots. No NixOS workaround needed; also removes the write-to-source-tree antipattern on Raspberry Pi OS (upstreamable). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The upstream merge kept nixos's outer tetra3_dir (python/PiFinder/tetra3)
while taking upstream's solver.py, which built the DB path as
tetra3_dir/data/default_database.npz. But the submodule nests the package
at tetra3/tetra3, so the DB is actually at tetra3/tetra3/data -> the solver
crashed at startup with FileNotFoundError.
Load it by its canonical name, Tetra3("default_database"); tetra3 resolves
the bundled DB from its own package data dir regardless of the inner/outer
layout. Validated on-device (loads from python/tetra3/data).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
release.yml now installs Nix + Attic like build.yml and pushes the release closure to a dedicated, never-GC'd `pifinder-release` cache. Devices and the migration first-boot trust it ahead of the dev `pifinder` cache and cache.nixos.org. Removes the last Cachix usage from active config. Docs (RELEASE.md, ADR 0004, NIXOS_STATUS.md) document the dev-vs-retained-release two-cache split and the per-cache retention caveat. The pifinder-release trusted-public-key is a placeholder until the cache is bootstrapped server-side. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The 6 .direnv/ nix-direnv cache files were tracked but regenerate on every direnv reload, so rebases baked divergent copies into the nixos stack. Gitignore + untrack stops that churn going forward. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…erivations Replace the 524-line nixos/pkgs/python-packages.nix (26 manually packaged PyPI deps, each with hand-chased hashes and build patches) with a uv-managed workspace realized into the Nix store via uv2nix. Changes: * Deps declared in python/pyproject.toml, pinned in python/uv.lock (117 pkgs) * nixos/pkgs/uv-python.nix builds the runtime/dev virtualenvs; the 5 native packages (python-libinput, python-prctl, python-pam, dbus-python, pygobject) keep their build patches as uv2nix overrides * flake.nix: add pyproject-nix/uv2nix/pyproject-build-systems inputs, thread via specialArgs, devShell uses the uv2nix devEnv * libcamera Python bindings stay a Nix overlay (not on PyPI) All four nixosConfigurations + the devShell evaluate; the aarch64 build is to be validated by CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cache DeterminateSystems/magic-nix-cache-action now returns HTTP 418 and the GitHub Actions cache rate-limits it (Twirp ResourceExhausted, "rate limit exceeded"), so `nix develop` cannot fetch the dev environment and every lint/test/type-check job fails before ruff/pytest/mypy even run — which all testable PRs inherit. Mirror build.yml/release.yml and substitute from the self-hosted Attic cache cache.pifinder.eu (ADR 0004) instead, falling back to cache.nixos.org when ATTIC_TOKEN is unavailable (e.g. fork PRs) so the job never hard-fails. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
state.py imported timezonefinder and built a TimezoneFinder() in SharedState.__init__, and plot.py imported pandas at module level. Both load during boot — SharedState is constructed at startup, and plot.py is pulled in via menu_structure -> UIChart. Defer the TimezoneFinder construction to the first set_location(), and pandas to the plot functions that use it, so neither blocks startup. comets.py is intentionally left unchanged: its module-level 'from skyfield.data import mpc' imports pandas regardless, so deferring pandas there has no effect without also deferring skyfield. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Re-implements the salvageable part of #378 (boot-speed), cleanly on top of the
nixosbranch.#378's perf work was lost during that branch's own internal merges — its HEAD no longer contains the lazy-import changes, and they were never merged upstream either. This brings back the part that still matters for startup time.
What this defers
state.py):SharedState.__init__eagerly importedtimezonefinderand constructedTimezoneFinder()(which loads its dataset) at startup. It's now constructed lazily on the firstset_location(), i.e. after boot.plot.py): imported at module level.plot.pysits on the startup path (menu_structure→UIChart→plot), so this loaded pandas during boot. Deferred into the four functions that actually use it.comets.pyis intentionally left unchanged: its module-levelfrom skyfield.data import mpcimports pandas regardless, so deferring pandas there is a no-op without also deferring skyfield (out of scope here).Verification
state.pyno longer loadstimezonefinder(confirmed at runtime).plot.py's only module-level pandas reference is removed;skyfield.data.hipparcosalready imports pandas lazily, and nothing else inplot.py's import chain pulls pandas.py_compileclean on all touched files.🤖 Generated with Claude Code